Molecular differentiation between complete and incomplete responders to neoadjuvant therapy in rectal cancer

Background Neoadjuvant chemoradiotherapy (nCRT) is the standard treatment for locally advanced rectal cancer, but only 20–40% of patients completely respond to this treatment. Methods To define the molecular features that are associated with response to nCRT, we generated and collected genomic and transcriptomic data from 712 cancers prior to treatment from our own data and from publicly available data. Results We found that patients with a complete response have decreased risk of both local recurrence and future metastasis. We identified multiple differences in DNA mutations and transcripts between complete and incomplete responders. Complete responder tumors have a higher tumor mutation burden and more significant co-occurring mutations than the incomplete responder tumors. In addition, mutations in DNA repair genes (across multiple mechanisms of repair) were enriched in complete responders and they also had lower expression of these genes indicating that defective DNA repair is associated with complete response to nCRT. Using logistic regression, we identified three significant predictors of complete response: tumor size, mutations within specific network genes, and the existence of three or more specific co-occurrent mutations. In incompletely responder tumors, abnormal cell-cell interaction and increased cancer associated fibroblasts were associated with recurrence. Additionally, gene expression analysis identified a subset of immune hot tumors with worse outcomes and upregulated of immune checkpoint proteins. Conclusions Overall, our study provides a comprehensive understanding of the molecular features associated with response to nCRT and the molecular differences in non-responder tumors that later reoccur. This knowledge may provide critical insight for the development of precision therapy for rectal cancer.


Background
Colorectal cancer is the third leading cause of cancer death in the United States [1].Rectal cancer constitutes about one-third of cases.Stage 2 and 3 rectal cancers predominantly undergo treatment with neoadjuvant chemoradiation (5FU and 2 Gy radiation for six weeks) and often chemotherapy (FOLFOX), followed by surgical excision using the total mesorectal excision technique [2].Responses to this neoadjuvant treatment vary substantially [3], with complete response rates between 20-40% [4].Complete responders have reduced risk of local recurrence and metastasis [5,6] and as demonstrated in a recent multicenter trial, these patients may not need to undergo surgical excision [4].The lack of predictors of response remains a signi cant obstacle to improving treatment outcomes.
Prior research has assessed clinical predictors of complete response, including increased serum carcinoma-embryonic antigen (CEA), imaging-detected extra-mural venous invasion and tumor size, primary tumor and regional lymph nodes staging (T and N stages) [7].Clinical predictors that predict overall worse outcome generally predict worse response to neoadjuvant therapy, but no prior study has identi ed genomic predictors that are used clinically to predict which patients will be complete responders to neoadjuvant treatment.
As technology advances, studies have begun to assess alterations in DNA, RNA, and individual proteins as well as immune cells as predictors of response [8][9][10][11][12].Smaller studies of individual or small groups of genetic alterations have yielded mixed results.A recent meta-analysis that assessed the prognostic role of speci c genomic mutations (RAS, TP53, BRAD, PIK3CA, and SMAD4) in predicting complete response found that only KRAS mutation was signi cantly associated with incomplete response [13].Most of these studies had small sample sizes (< 30 patients) and didn't integrate RNA and DNA sequencing data [3,14].The larger dataset from the Cancer Genome Atlas (TCGA) rectal cancer cohort excludes treated tumors and thus, this is unable to address the question of why patients respond differently [15].A recent study utilizing genomic data on 738 rectal cancer patients provided a summary of genomic data but lacked in-depth integrated analysis assessing for predictors of response [16].
Our study combined genomic sequencing data from pre-treatment biopsies of 20 rectal cancer patients from our group with publicly available data from 692 patients, excluding patients with stage 4 disease and those treated with chemotherapy only [16].Because our data was collated from multiple studies, it included patients who had a variety of treatments, but all underwent treatment with 5-uorouracil plus long course radiation as part of their treatment.We performed an in-depth integrated analysis to identify response-associated alterations and features linked to future recurrence.Our ndings revealed multiple differences between CR and ICR patients, including distinct mutation networks, DNA repair alternations, tumor mutation burden (TMB), and immune cell in ltrate.We performed logistic regression to identify predictors of complete response and assessed ICR tumors for recurrence-associated features and identi ed that recurrent tumors had decreased immune cells, increased cancer-associated broblasts, and mutations affecting TGF-ß signaling and cell-cell interaction pathways.Overall, we present a novel and comprehensive evaluation of predictors of rectal cancer outcomes after neoadjuvant therapy.

Patients
For this study, we identi ed and analyzed available data for patients who had pre-treatment tumor biopsies with subsequent DNA and/or RNA sequencing of their tumor, received chemoradiation, had subsequent surgical resection or clinical followup as part of a trial, and had available data regarding tumor response to treatment.Tumors were from four cohorts: University of Michigan (UM) (20 patients), Timing of Rectal Cancer Response to Chemoradiation (TIMING), ACOSOG Z6041, and Memorial Sloan Kettering (MSK) Research and Clinical cohorts (691 patients) were included, totaling 711 patients with con rmed diagnosis of rectal adenocarcinoma (Fig. 1).The UM patients underwent treatment with 5FU and 54Gy radiation over 5.5 weeks (chemoradiation) followed by surgery 8-12 weeks later.The TIMING trial patients got chemoradiation followed by differing numbers of cycles of FOLFOX (5FU, oxaloplatin, and leukovorin) prior to surgical resection.The patients in ACOSOG Z6041 cohort were treated with neoadjuvant chemoradiotherapy consisting of 50 Gy of radiation, capecitabine and oxaliplatin followed by local excision.The MSK cohort got chemoradiation either before or after 8 cycles of FOLFOX and patients who appeared to have a clinical complete response on endoscopy and imaging were followed.
For our study, we considered patients to have a CR if they underwent surgical resection and had a CR or if they were a persistent clinical CR in the MSK dataset but did not undergo surgical resection.All genomic and associated clinical data for MSK and TIMING were downloaded from the cBioPortal (https://www.cbioportal.org/study/summary?id=rectal_msk_2022). Raw RNA sequencing data from all cohorts were downloaded from GEO under the accession numbers: GSE242786 and GSE209746.

Whole Exome Sequencing
For the UM cohort, genomic DNA samples were fragmented to a target size of 300 bp using the Covaris S2 system.The samples were end-repaired, poly A-tailed, and ligated with custom adapters using the NEBNext DNA Library Prep kit.These adapters had 6 bp barcodes designed with BARCRAWL software 48   and synthesized by Integrated DNA Technologies.After ligation, they were size selected to 300 bp on a 2% agarose gel, retaining 1-mm gel slices.The samples were isolated from the gel using the Qiagen QIAquick system.Ligation products (10 or 15 µl) were enriched using the Phusion master mix kit and underwent 14 PCR cycles.PCR products were puri ed using AmpureXP beads.Library quality was checked using the Agilent Bioanalyzer and qPCR.The libraries were captured using the Nimblegen SeqCap EZ V3 Exome Enrichment Kit and sequenced on the Illumina HiSeq 2000 with 100 bp paired-end reads using v3 reagents.

Somatic Mutation Detection
For detecting somatic single nucleotide variant (SNV) and insertions/deletions (INDELs), the best practice guidelines of GATK (v4.0.11) [17] (https://software.broadinstitute.org/gatk/bestpractices/workow?id=11146) were followed, and strelka2 (v2.9.10) [18] was employed.Somatic variants with a depth < 30 and Allele Fraction (AF) < 0.1 were excluded.The remaining mutations were annotated using ensembl-vep (v109.3)[19].The VCF les were converted to MAF format using vcf2maf (v1.6.21)[20].Analysis focused on mutations in coding regions.A mutation rate (total mutations/length of coding regions) was calculated for all somatic mutations.A propensity score weighting algorithm (PSW) [21] was applied to balance confounding factors.The propensity score was calculated based on the response type (CR and ICR) using a logistic regression.Samples were re-weighted using the Matching Weight scheme (MW) [22].This process included balanced checking steps that revised the calculation until standardized differences of all covariates were less than 10%, balancing confounding factors between CR and ICR patients.Next, mutation rates between the two balanced groups were compared using a weighted t-test.

Signi cantly Mutated Genes Detection
MutSig2CV [23] was applied to identify signi cantly mutated genes in tumors with complete response and incomplete response to nCRT.Signi cantly mutated genes were mutated beyond random expectations, considering the background mutation rate and mutational processes.Genes with q-value < 0.05 in CR tumors, but > 0.05 in ICR tumors were de ned as CR-speci c, and vice versa for ICR-speci c genes.Functional enrichment analysis of SMGs was performed using Enrichr [24].Signi cantly enriched Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways were identi ed with a threshold p-value < 0.05.
The mutated genes with > 5% mutation frequency were selected for comparison between CR and ICR, including recurrent and nonrecurrent ICR tumors.Lollipop plots were generated using the maftools (v3.15) [25] package.Co-occurring and mutually exclusive mutation pairs were analyzed using the maftools package.Interactions of top 25 mutated genes were visualized using the ggplot2 package.

RNA Sequencing
From our cohort, RNA was isolated from fresh frozen pre-treatment biopsy samples using the Allprep kit (Quiagen).Total RNAs were used to generate mRNA sequencing libraries and sequenced on the Illumina HiSeq 2000 platform.The TIMING and MSK samples were treated similarly [16].
For the validation of subset ICR tumors using TCGA rectal tumors (READ), the gene expression data processed using the second TCGA analysis pipeline (RNAseqV2) were collected.Only tumors in stage II and III with consensus molecular subtypes (CMS) subtypes and microsatellite instability (MSI) status, were included.The hierarchical clustering method was used to identify cluster 4-like tumors based on the DEGs between cluster 4 and other clusters.

Survival Analyses
The "Survival" package (v3.3-1) was used to build a standard survival object.To evaluate gene expression on survival, Kaplan-Meier survival curves were generated.Patients were strati ed into high and low expression groups based on mean expression levels.Survival curves for these groups were compared, and the log-rank test was used to evaluate signi cant differences in survival durations (pvalue < 0.05).

Logistic Regression Model for Complete Response
To identify potential predictors of tumor response, a logistic regression model [38] was t to clinical and genetic data.First, bivariate comparisons using Fischer's exact test were made between tumor response and binary indicators for gene mutations and co-occurring mutations, with factors at p-value < 0.1 combined into variables measuring counts of particular mutated genes and counts of co-occurring mutations.Bivariate comparisons were also made between tumor response and patient demographics (age, gender, race, BMI) and tumor characteristics (size, stage, MSI status).A logistic regression model for complete response was then t including tumor size, tumor stage, and the variables indicating the number of identi ed gene mutations of interest and number of co-occurring mutations of interest.Count variables were explored as both continuous and categorical and a nal model was selected with the highest R 2 Tjur and lowest AIC.The nal functional form of mutation count variables consisted of a binary indicator for any mutation in a gene of interest and a categorical variable indicating 0, 1 to 2, or 3 or more co-occurring mutations of interest.Model performance metrics including sensitivity, speci city, positive prediction value (PPV), and negative prediction value (NPV) were evaluated on the training data using the R package pROC [39].

Statistical Analysis
Statistical analyses were conducted using R Studio (v1.2.1335) and R (v3.6.1).Speci c R packages and methods used in each step are described in the preceding sections.

Results
This study analyzed 711 stage 2 and 3 rectal cancer patients who received chemoradiation post-biopsy from four cohorts: 20 from University of Michigan (UM), and 691 from the ACOSOG (n = 25), Timing (n = 68) and Memorial Sloan Kettering (MSK, n = 598) groups.We excluded 57 patients who only received chemotherapy and 15 with stages I or IV tumors.Subsequently, 336 patient tumors with available DNA or RNA sequencing data were used for our genomic analyses (Fig. 1A).Those showing pathological or clinical complete response were categorized as complete responders (CRs), while the rest were classi ed as incomplete responders (ICRs).Signi cant clinical differences were identi ed between CR (n = 124) and ICR groups (n = 267) (Supplementary Fig. 1).Notably, ICR patients exhibited larger tumor sizes (Mann Whitney U test, p-value < 0.05, Fig. 1B), higher recurrence rates (Chi-square test, p-value < 0.05, Fig. 1C-D) and deceased overall survival (Chi-square test, p-value < 0.05, Fig. 1E-F).Furthermore, 79.4% of ICR patients were lymph nodes positive (N1, N2, N+), compared to 70.2% of CRs (Fig. 1G-H).These results align with earlier research linking complete response and reduced recurrence [40].Notably, treatment approach was not signi cantly associated with the type of response; comparison between CR and ICR showed no statistically signi cant difference in response to chemoradiation therapy whether it was administered as a consolidation or as the only form of treatment (Chi-square test, p-value = 0.373) (Supplementary Table 1).In addition, our analysis indicated that microsatellite instability (MSI) status did not signi cantly distinguish between CR and ICR outcomes (Chi-squared test, p-value 0.233) (Supplementary Table 1).

Genomic characteristics of response to nCRT
Prior study has indicated that complete response to nCRT is associated with a high tumor mutation burden (TMB) in rectal cancer [41] but potential confounding factors were not accounted for such as BMI, gender, and age.To do so, we applied the propensity score weighting algorithm in the TMB comparison (Supplementary Fig. 2A).Even after this adjustment, CR tumors exhibited signi cantly higher mutation rates (Fig. 2A).Importantly, this increased mutation rate in CR tumors was not in uenced by the presence of microsatellite instability (MSI) in tumors (Supplementary Fig. 2B).This nding implies that TMB may be an important factor for predicting the response to neoadjuvant therapy.
In addition to TMB, we also aimed to identify the relevant genetic features associated with response.We rst focused on the frequently mutated genes and compared their mutation frequencies between CR and ICR tumors.The commonly mutated colorectal cancer-related genes, such as TTN, APC, TP53, and KRAS were found in a high percentage of rectal tumors (Fig. 2B, Supplementary Fig. 2C-D).However, genes like APC, TP53, KRAS, PIK3CA and FBXW7 displayed higher mutation frequencies in ICR tumors (Supplementary Fig. 2D).Interestingly, genes with higher mutation frequencies in CR tumors were enriched in several DNA repair pathways (Fig. 2C), including Mismatch repair (MMR), Base excision repair (BER), Homologous recombination (HR), and Nucleotide excision repair (NER).This observation suggests a potential association between complete response to nCRT and DNA repair de ciency.
Next, we used MutSig2CV [23] to identify signi cantly mutated genes (SMGs) for CR and ICR tumors, respectively.MutSig2CV identi es genes that are mutated more frequently than expected by chance, considering three factors: background mutation rate, localized hotspots, and vertebrate conservation.We discovered 36 SMGs in ICR tumors and 17 in CR tumors (Fig. 2D, Supplementary Table 2).Several well-known cancer driver genes, including TP53, APC, KRAS, NRAS, ARID1A, and PIK3CA, were present in both groups.Notably, consistent with the frequently mutated genes in CR tumors (Fig. 2D), the CRspeci c SMGs were enriched in DNA repair pathways, while the ICR-speci c SMGs were predominantly associated with oncogenic signaling pathways (Fig. 2E).Furthermore, we speci cally assessed for mutations in genes associated with various mechanisms of DNA repair.After balancing confounding factors, we observed higher mutation frequencies in MSH3, MLH1, PMS1, BRCA1, BARD1, POLD1, and POLE in CR than ICR tumors (Fig. 3A).Overall, a signi cant higher proportion of CR patients (52% vs. 34%, p-value < 0.05) had DNA repair-related mutations compared to ICR (Fig. 3B-3C).These DNA repair alterations were not related to MSI status as they were rarely found in MSI tumors and MSI-H status was very similar and rare between CR and ICR tumors (2/104 CR and 2/260 ICR tumors).
In cancer, certain mutations exhibit evolutionary dependencies [42], re ecting interactions between genes.Alteration in one gene may in uence the mutation pattern of another, thereby creating a complex network of mutually exclusive or co-occurring mutation pairs [43].To explore whether speci c genetic interactions are associated with the response to nCRT, we detected co-occurring and mutually exclusive mutation pairs within CR and ICR tumor groups.Interestingly, CR tumors demonstrated more signi cantly co-occurring mutations in cancer-related genes, with 70 co-occurring pairs in CR tumors compared to 38 pairs in ICR tumors (Fig. 3D and 3E, Supplementary Data 1, Fisher's exact test, p-value < 0.05).In CR tumors, genes with co-occurring mutations were primarily involved in DNA damage response, double-strand DNA break (DSB) response, and nucleotide excision repair (NER) (Fig. 3F).In contrast, the gene network with co-occurring mutations in ICR tumors was predominantly associated with tumorigenic pathways, such as RAS signaling, autophagy, apoptosis, and anti-apoptosis (Fig. 3G).
Additionally, we identi ed 7 signi cantly mutually exclusive mutation pairs in ICR tumors and 3 in CR tumors (Fisher's exact test, p-value < 0.05) (Supplementary Data 1).Alterations in KMT2D and TP53 were observed in both CR and ICR groups.The mutually exclusive alterations in APC and RNF43 has been reported in microsatellite-unstable colorectal adenocarcinomas [44].Mutations in PTEN and TP53 were also exclusive in a subgroup of colorectal cancer [45] and breast cancer [46].Similar exclusivity between TP53 and ARID1A has been observed in other cancer types [47].

Transcriptomic differences and tumor immune cell in ltrate in response to nCRT
To compare the transcriptomic differences between CR and ICR tumors, we merged the RNA sequencing data from 114 patients from the MSK cohort and 7 patients from the UM cohort.We identi ed 678 downregulated and 1018 upregulated genes in CR tumors (p-value < 0.05 and log2FC > 0) (Fig. 4A).Consistent with the study from Chatila et.al [16], IGF2 and L1CAM were distinctly overexpressed in ICR tumors.Gene Set Enrichment Analysis (GSEA) revealed that genes with relatively low expression in CR tumors were signi cantly enriched in several DNA repair-associated pathways, including nonhomologous end joining (NHEJ), NER, MMR, homologous recombination (HR), and BER pathways (Fig. 4B).This is consistent with our mutation data analysis where DNA repair genes were predominantly mutated, leading to DNA repair de ciency.CR tumors also demonstrated overexpression of numerous genes that were signi cantly enriched in immune-related pathways (Fig. 4B, Supplementary Fig. 3).In addition to the association of tumorigenic pathways with mutation data, gene expression analysis suggests that increased immune function in CR tumors may be critical to nCRT as well.
Emerging evidence shows that tumor-in ltrating immune cells (TICs) are associated with sensitivity to nCRT [48,49].To investigate the differences in TICs between CR and ICR tumors, we employed TIMER [32] and subsequently assessed the differential fractions/enrichment scores.We observed that CR tumors manifested a signi cantly reduced in ltration of T cells, particularly CD8 + T cells (Mann Whitney U test, p-value < 0.05) (Fig. 4C-D).In a contrasting observation, although not reaching statistical signi cance (Mann Whitney U test, p-value > 0.05), ICR tumors exhibited lower fractions/enrichment scores across a spectrum of immune cells, including B cells, Neutrophil, M2 macrophages, dendritic cells, neutral killer (NK) cells, Neutrophils, and monocytes (Supplementary Data 2).

Predictors of complete response to neoadjuvant therapy in rectal cancer
To elucidate both clinical and genomic predictors of complete response (CR), we selected patients with clinical stage II and III who received neoadjuvant therapy and for whom targeted or whole exome DNA sequencing data was available.This resulted in 233 ICR and 101 CR patients for analysis.The bivariate analysis revealed that the only clinical variable that differed between CR and ICR patients was median tumor size (CR 4.2 cm (Q1-Q3) 3.2-5.2) vs ICR (4.7 cm (3.8-6.0))(Supplementary Table 1).Treatment group did not predict whether a patient went on to be a CR or ICR (Supplementary Table 1).We next assessed whether presence of particular non-synonymous deleterious mutations differed between CR and ICR and identi ed 10 alterations that were more common in CR patients with p-values of 0.1 or less (Supplementary Table 3).Network analysis indicated interactions between all 10 genes (Supplementary Table 4).These interactions, such as co-expression, were identi ed through curated databases (Supplementary Fig. 4).Because CR patients had more co-occurring mutations when the genomic analysis was performed (Fig. 3D-E), we did bivariate comparisons between all identi ed co-occurring mutations in CR and ICR patients and identi ed a subset of 95 that were more common in CR patients with p-values < 0.1 (Supplementary Data 3).Next, we t a logistic regression model for CR using tumor size (cm), clinical stage, a mutation in any of the 10 network genes, and number of co-occurring mutations grouped as 0, 1-2, or 3 or more co-occurring mutations (Supplementary Data 4).After selecting the model with the highest R 2 Tjur and lowest AIC, we observed that predictors of CR included smaller tumor size (OR: 0.79, 95%CI: 0.66-0.94),mutation of any of the network genes (OR: 2.57, 95%CI: 1.37-4.77),and presence of more co-occurring mutations (likelihood ratio test p-value < 0.001).Compared to patients with zero co-occurring mutations, those with 1 to 2 mutations exhibited a 1.57 (95%CI: 0.67-3.51)higher odds of CR and those with 3 or more co-occurring mutations showed an even greater 8.47 (95%CI: 2.74-32.26)times higher odds of CR (Table 1).

Genomic and molecular determinants of incomplete response to nCRT with subsequent recurrence
Given that approximately one-third of ICR tumors subsequently recurred, we explored the molecular features associated with recurrence in these pre-treatment biopsy samples, aiming to enhance our understanding and explore potential strategies speci cally to target these recurrent ICR tumors.Although recurrent ICR patients had signi cantly (the log rank test, p-value < 0.0001) poorer survival (Supplementary Fig. 5A), there was no signi cant difference in their tumor mutation burden, altered genome fractions, or MATH scores [50] (Supplementary Fig. 5B-D).However, these tumors that later reoccurred exhibited a higher frequency of mutations in cancer driver genes, including APC, TP53, KRAS, and TTN (Fig. 5A).Enrichment analysis revealed these tumors had more mutations in pathways related to TGF-ß signaling, ECM-receptor interaction, and cell-cell interaction (Fig. 5B).In addition, gene expression pro les indicated a downregulation of cell-cell interaction genes in recurrent ICR tumors (Fig. 5C).Moreover, more Cancer Associated Fibroblasts (CAFs), known to in uence colorectal cancer (CRC) metastasis [51], were detected in the recurrent ICR tumors (Fig. 5D).Consistent with enriched mutations and decreased expression of cell-cell interaction genes, targeting CAFs may be a potential therapeutic strategy to prevent metastasis in CRC [52,53].Additionally, recurrent tumors demonstrated reduced in ltration of NK cells, dendritic cells, and Regulatory T Cells (Tregs) (Fig. 5E-G).

Characteristics of subgroups of the incomplete responders
Given the evidence [54] that gene expression has a higher predictive capacity for drug response than other factors such as mutation signature, copy number variation, and methylation, we aimed to identify ICR patient subgroups based on gene expression patterns.Our objective was to understand the molecular characteristics of these subgroups, potentially guiding alternative therapeutic approaches.Through unsupervised hierarchical clustering of 76 ICR patients with whole exome and transcriptome sequencing and outcome data (Supplementary Data 5), we identi ed four groups with distinct gene expression pro les (Fig. 6A, Supplementary Fig. 6A).
Previous studies have reported that DUOX2 and SSH1 exhibited signi cantly higher expression and promoted the progression and metastasis of CRC [58,59].Additionally, FRMD6, which is an upstream regulator of Hippo signaling, serves as a marker of the CMS4 subgroup in CRC [60, 61].
Furthermore, the immune microenvironment of Cluster 4 tumors was characterized by increased in ltration of various immune cells, including neutrophils, monocytes, macrophages, activated myeloid dendritic cells, naive CD4 + T cell, as well as Tregs cells (Supplementary Fig. 6K-P).The immune-rich microenvironment of Cluster 4 may indicate that there is an opportunity for immunotherapy for this speci c subset of tumors.Of note, we observed signi cant overexpression of PDCD1 (PD-1) in Cluster 4 (Fig. 6E), highlighting its potential as a therapeutic target [62, 63].
To validate these ndings, we applied the differential expression signature between Cluster 4 and other clusters to 42 rectum adenocarcinoma (READ) samples in TCGA.We identi ed that a subgroup, termed Cluster4-like, demonstrated a similar expression pro le to Cluster 4 (Supplementary Fig. 6Q-R).
Interestingly, the tumors in the Cluster4-like group (n = 10) also had poor outcomes, with 80% being CMS4 subtype (Supplementary Fig. 6S).Immune checkpoint genes, including PD-1, CD274 (PD-L1), CTLA4, HAVCR2 (TIM3), and LAG3, were overexpressed in Cluster4-like tumors (Fig. 6F).These tumors also displayed increased immune cell in ltration (Fig. 6G).These differences mirrored Cluster 4 characteristics.Collectively, our ndings suggest that greater attention should be given to the subsets of patients with high immune cell in ltration and overexpression of immune checkpoint genes, as these patients might bene t from immune checkpoint inhibitors (ICIs).

Discussion
Rectal cancer treatment is evolving.Recent clinical studies have focused on optimizing treatment by altering the order of chemotherapy and chemoradiotherapy relative to surgery and have identi ed patients where organ preservation appears feasible which will lead to decreased surgery.Yet, molecular predictors of treatment response remain unidenti ed [4].In this current study, we have gathered public and institutional genomic data from pre-treatment biopsies of patients undergoing chemoradiotherapy, followed by clinical response assessment or surgery.Our in-depth comparative analysis sought to answer the 2 most fundamental clinical questions in the care of rectal cancer patients: how CR and ICR patient tumors differ prior to treatment and how tumors from patients who later will recur differ from those who don't.We identi ed multiple important differences, some of which are potentially targetable.We utilized advanced bioinformatics approaches for these comparisons, including the use of the propensity score weighting to balance confounding factors that may have been associated with genetic changes, such as BMI, age, gender, tumor size, and N and T stages.
Our comprehensive analysis has identi ed important molecular and clinical factors in uencing the response to neoadjuvant chemoradiotherapy (nCRT).We identi ed that ICR patients presented with larger tumor sizes and go on to have higher recurrence rates compared to CR patients.CR tumors demonstrated a higher mutation burden (TMB) and frequent mutations in genes linked to DNA repair pathways, while ICR tumors had higher mutation frequencies in oncogenic signaling pathways.Furthermore, CR tumors showed a higher rate of co-occurring mutations, enrichment in DNA repair alterations and DNA damage response networks.Transcriptomic analysis revealed that ICR tumors overexpressed DNA repair genes, while CR tumors have increases in genes in immune pathways.The immune microenvironment of CR tumors had signi cantly fewer in ltrating T cells, particularly CD8 + T cells.Using logistic regression analysis, we identi ed potential CR predictors for future studies, including alterations in speci c gene networks and multiple co-occurring mutations.Our study also revealed unique characteristics for ICR tumors that later recurred, including increased TGF-ß signaling and impaired cell-cell interaction.Lastly, we identi ed four distinct ICR subgroups based on gene expression, each suggesting different therapeutic strategies.Speci cally, the Cluster-4 group, with a CMS-like tumor expression pro le, demonstrated the greatest resistance to nCRT and most immune in ltration, suggesting the potential for targeted checkpoint inhibitor therapy to improve the outcome in this challenging patient subgroup.
Previous retrospective studies have reported that de ciency in mismatch repair (MSI-H status) have similar complete response rates to nCRT as mismatch pro cient tumors but may have improved downstaging with treatment [64,65].In our study, we observed that mutations in multiple DNA repair pathways were enriched in CR patients.This was not due to differences in MSI-H tumors which were only about 2% of tumors in both CR and ICR groups.The mutations we identi ed in DNA repair align with decreased expression of DNA repair mRNAs.In contrast, ICR tumors exhibit a increases in DNA damage repair gene expression.Our ndings underscore that defective DNA repair mechanisms, including dMMR, dNER, dBER, and dHRR, relate to a complete response to nCRT.Prior studies have not assessed the spectrum of DNA repair pathways as we have done here.The activation of DNA repair in tumor cells likely is in response to DNA damage induced by chemoradiotherapy and when the repair is effective, this will result in ICR.Moreover, a higher frequency of co-occurring mutations in CR tumors suggests the potential of synthetic lethality, a condition where concurrent genetic events lead to cell death [66].Our data indicate that this phenomenon is strongly correlated with the response to nCRT, again indicating that improved response to treatment aligns with an inability to overcome tumor damage from treatment.
Our study suggests that evaluating mutation status in predictor genes and speci c co-occurring mutations may predict CR status, potentially guiding surgical decision making.
Rectal cancer is a complex and molecularly heterogeneous disease [67].For ICR tumors, our objective was to identify subsets that may be responsive to alternative treatments.We rst investigated the molecular features of recurrent ICR tumors.Compared to nonrecurrent ICR tumors, recurrent counterparts had no signi cant differences in TMB or genome alterations (Supplementary Fig. 5B-C).However, ICR tumors which later recurred, displayed a higher frequency of mutations in cell-cell interaction pathways, including focal adhesion, adherens junction, TGF-ß signaling, and ECM-receptor interaction.Moreover, the expression data found diminished activity in these pathways, suggesting impaired cell-cell interaction is characteristic of these tumors.Therefore, strategies targeting this dysregulation may bene t patients with these alterations which we identi ed in ICR tumors that went on to recur.Interestingly, these differences are already identi able in pre-treatment biopsies indicating that tumors harbor alterations which cause subsequent metastasis even at diagnosis.Subsequently, we identi ed a subset of ICR tumors using gene expression pro les and validated these using TCGA data.This patient subset, characterized by poor outcomes, exhibited a reduced mutation burden and genomic alteration but displayed signi cantly increased immune cell in ltration.Moreover, the over-expression of immune checkpoint mRNAs suggests a potential responsiveness to immune checkpoint inhibitors.

Additional Information
Mutation rates and signi cantly mutated genes (SMGs) in CR and ICR tumors.A: Mutation Burden Comparison.Mutation rates are compared between complete responders (CR, n=28) and incomplete responders (ICR, n=109) using a weighted t-test, showing a signi cant difference (p=0.0369).B: Mutational Frequencies in CR vs. ICR Tumors.Bar plot indicates higher mutation frequencies in CR tumors (n=104; microsatellite instability [MSI]=7; no matched control [NMC]=12) compared to ICR tumors (n=260; MSI=7; NMC=27).C: Pathway Enrichment in CR Tumors.Bar plot demonstrates pathways signi cantly enriched with frequently mutated genes in CR tumors.D: Dot plots showing Signi cantly Mutated Genes (SMGs) enriched in CR tumors (lightsteelblue, n=104), ICR tumors (darkseagreen, n=260), and both groups (wheat), q-value < 0.05 after correction for multiple hypothesis testing.E: Enriched pathways for SMGs.Pathways signi cantly enriched for SMGs are shown, with the rich ratio indicating the proportion of SMGs relative to the total number of genes in each pathway.Networks of co-occurring mutated genes in CR tumors (F) and ICR tumors (G). Figure 6

Figure 1 Summary
Figure 1

Table 1
Logistic regression model coe cients for prediction of complete response (CR) to treatment CI: Con dence interval; AIC: Akaike Information Criterion